Monolingual Experiments with Far-East Languages in NTCIR-6

نویسندگان

  • Samir Abdou
  • Jacques Savoy
چکیده

This paper describes our third participation in an evaluation campaign involving the Chinese, Japanese and Korean languages (NTCIR-6). Our participation is motivated by three objectives: 1) study the retrieval performances of various probabilistic and language models for these languages; 2) compare the relative retrieval effectiveness of a combined “unigram & bigram” indexing scheme combined with an automatic wordsegmenting approach for Chinese and Japanese languages; and 3) evaluate the relative performance of the various data fusion strategies used to combine separate result lists in order to enhance retrieval effectiveness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MIRACLE Retrieval Experiments with East Asian Languages

This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of J...

متن کامل

Experiments in the Retrieval of Unsegmented Japanese Text at the NTCIR-2 Workshop

Our work with the Hopkins Automated Information Retriever for Combing Unstructured Text (HAIRCUT) system has made use of overlapping character n-grams in the indexing and retrieval of text. In previous experiments with Western European languages we have shown that longer length n-grams (e.g., n=6) are capable of providing an effective form of alinguistic term normalization. We have wanted to in...

متن کامل

NTCIR-6 CLIR-J-J Experiments at Yahoo! Japan

This paper describes NTCIR-6 experiments of the CLIRJ-J task, i.e. Japanese monolingual retrieval subtask, at the Yahoo group, focusing on the parameter optimization in information retrieval (IR). Unlike regression approaches, we optimized parameters completely independent from retrieval models so that the optimized parameter set can illustrate the characteristics of the target test collections...

متن کامل

NTCIR-6 Monolingual Chinese and English-Chinese Cross Language Retrieval Experiments using PIRCS

In NTCIR-6, our Stage-1 results which consist of using old queries retrieving on a different old collection, were not official because of late submission. Stage-2 submissions, which consists of repeating previous experiments, were on time. These NTCIR-6 experiments were conducted as new without referring to any previous knowledge about the runs. Comparisons with old results however were less fa...

متن کامل

NTCIR-2 ECIR Experiments at Maryland: Comparing Structured Queries and Balanced Translation

Pirkola’s structured queries have been shown to perform well for word-based cross-language information retrieval in European languages, but in monolingual Chinese retrieval experiments it is often found that character bigrams perform as well as, and sometimes better than, automatically segmented words. During the Mandarin-English Information (MEI) project at the Johns Hopkins Summer 2000 Worksh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007